3 research outputs found
Efficient Parallel Random Sampling : Vectorized, Cache-Efficient, and Online
We consider the problem of sampling numbers from the range
without replacement on modern architectures. The main result
is a simple divide-and-conquer scheme that makes sequential algorithms more
cache efficient and leads to a parallel algorithm running in expected time
on processors, i.e., scales to massively parallel
machines even for moderate values of . The amount of communication between
the processors is very small (at most ) and independent of
the sample size. We also discuss modifications needed for load balancing,
online sampling, sampling with replacement, Bernoulli sampling, and
vectorization on SIMD units or GPUs
Thrill: High-performance algorithmic distributed batch data processing with C++
We present the design and a first performance evaluation of Thrill -- a
prototype of a general purpose big data processing framework with a convenient
data-flow style programming interface. Thrill is somewhat similar to Apache
Spark and Apache Flink with at least two main differences. First, Thrill is
based on C++ which enables performance advantages due to direct native code
compilation, a more cache-friendly memory layout, and explicit memory
management. In particular, Thrill uses template meta-programming to compile
chains of subsequent local operations into a single binary routine without
intermediate buffering and with minimal indirections. Second, Thrill uses
arrays rather than multisets as its primary data structure which enables
additional operations like sorting, prefix sums, window scans, or combining
corresponding fields of several arrays (zipping). We compare Thrill with Apache
Spark and Apache Flink using five kernels from the HiBench suite. Thrill is
consistently faster and often several times faster than the other frameworks.
At the same time, the source codes have a similar level of simplicity and
abstractio